Reversing and Smoothing the Multinomial Naive Bayes Text Classifier
نویسندگان
چکیده
Abstract. The naive Bayes text classifier has long been a core technique in information retrieval and, more recently, it has emerged as a focus of research itself in machine learning. This paper is concerned with the naive Bayes text classifier in its multinomial model instantiation. This model and an “equivalent” reversed version proposed here are interpreted under the statistical framework of log-linear modelling. The reversed version provides an alternative way for parameter estimation, which (in a broad sense) is actually the main issue considered. The paper is to a large extent devoted to the study of the effects of parameter smoothing and document length normalization. For the purpose of parameter smoothing, we consider a standard smoothing method for text classification and two alternative techniques that are often used in the context of statistical language modelling for speech recognition. Empirical results are provided comparing these techniques and the effect of length normalization for both the multinomial model and its reversed version.
منابع مشابه
Naive Bayes and Text Classification I - Introduction and Theory
2 Naive Bayes Classification 3 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Posterior Probabilities . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Class-conditional Probabilities . . . . . . . . . . . . . . . . . . . 5 2.4 Prior Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Evidence . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملMultinomial Mixture Modelling for Bilingual Text Classification
Mixture modelling of class-conditional densities is a standard pattern classification technique. In text classification, the use of class-conditional multinomial mixtures can be seen as a generalisation of the Naive Bayes text classifier relaxing its (class-conditional feature) independence assumption. In this paper, we describe and compare several extensions of the class-conditional multinomia...
متن کاملComplementary Venue Recommendation Model for Yelp
Our project attempts to simplify the search process for selecting multiple venues for a single outing using Yelp. We have developed a machine learning model that recommends a complementary venue (such as a café) based on a restaurant searched by a user. Using a binary classifier, complementary venues were scored (great venue or mediocre / poor venue) based on unigrams and bigrams in review text...
متن کاملOr gate Bayesian networks for text classification: A discriminative alternative approach to multinomial naive Bayes
We propose a simple Bayesian network-based text classifier, which may be considered as a discriminative counterpart of the generative multinomial naive Bayes classifier. The method relies on the use of a fixed network topology with the arcs going form term nodes to class nodes, and also on a network parametrization based on noisy or gates. Comparative experiments of the proposed method with nai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002